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Abstract — Unattended  ground  sensors  (UGS)  are  widely  used 
to  monitor  human  activities,  such  as  pedestrian  motion  and 
detection  of  intruders  in  a  secure  region.  Efficacy  of  UGS  systems 
is  often  limited  by  high  false  alarm  rates,  possibly  due  to  inade¬ 
quacies  of  the  underlying  algorithms  and  limitations  of  onboard 
computation.  This  paper  presents  a  symbolic  method  of  feature 
extraction  and  sensor  fusion,  which  is  built  upon  the  principles  of 
wavelet  transform  and  probabilistic  finite  state  automata  (PFSA). 
The  relational  dependencies  among  heterogeneous  sensors  are 
modeled  by  cross-PFSA,  from  which  low-dimensional  feature 
vectors  are  generated  for  pattern  classification  in  real  time.  The 
proposed  method  has  been  validated  on  data  sets  of  seismic 
and  passive  infrared  (PIR)  sensors  for  target  detection  and 
classification.  The  proposed  method  has  the  advantages  of  fast 
execution  time  and  low  memory  requirements  and  is  potentially 
well-suited  for  real-time  implementation  with  onboard  UGS 
systems. 

Index  Terms — Personnel  detection,  multimodal  sensor  fusion, 
feature  extraction,  seismic  sensor,  PIR  sensor 

I.  Introduction 

Unattended  ground  sensors  (UGS)  are  widely  used  in 
industrial  monitoring  and  military  operations.  Such  UGS 
are  usually  lightweight  devices  that  automatically  monitor 
the  local  activities  in-situ,  and  transfer  target  detection  and 
classification  reports  to  some  higher  level  processing  center. 
Commercially  available  UGS  systems  make  use  of  multiple 
sensing  modalities  (e.g.,  acoustic,  seismic,  passive  infrared, 
magnetic,  electrostatic,  and  video).  Efficacy  of  UGS  systems 
is  often  limited  by  high  false  alarm  rates  because  the  onboard 
data  processing  algorithms  may  not  be  able  to  correctly 
discriminate  different  types  of  targets  (e.g.,  humans  from 
animals)  [lj.  Power  consumption  is  a  critical  consideration  in 
UGS  systems.  Therefore,  power-efficient  sensing  modalities, 
low-power  signal  processing  algorithms,  and  efficient  meth¬ 
ods  for  exchanging  information  between  the  UGS  nodes  are 
needed  [2]. 

In  a  personnel  detection  problem,  the  targets  usually  include 
human,  vehicles,  and  animals.  Discriminating  human  footstep 
signals  from  other  targets  and  noise  sources  is  a  challenging 
problem,  because  the  signal  to  noise  ratio  (SNR)  of  footsteps 
decreases  rapidly  with  the  distance  between  the  sensor  and 
the  pedestrian.  Furthermore,  the  footstep  signals  may  vary 
significantly  for  different  persons  and  environments. 

Seismic  sensors  are  widely  used  for  personnel  detection, 
because  they  are  relatively  less  sensitive  to  Doppler  effects 
environment  variations  as  compared  to  acoustic  sensors  [3]. 


Current  personnel  detection  methods  using  seismic  signals  can 
be  classified  into  three  categories,  namely,  time  domain  meth¬ 
ods  [4],  frequency  domain  methods  [5],  and  time-frequency 
domain  methods  [3],  [6j.  Recent  research  has  relied  on  time- 
frequency  domain  methods,  such  as  wavelet  transform-based 
methods.  Passive  Infrared  (PIR)  sensors  are  widely  used  for 
motion  detection,  and  are  well- suited  for  UGS  systems  due 
to  low  power  consumption.  PIR  sensors  have  been  reported 
for  moving  targets  detection  and  localization  [7];  however, 
similar  effort  for  target  classification  has  not  been  reported 
in  open  literature,  although  PIR  sensor  signals  also  contain 
discriminative  information  in  the  time-frequency  domain. 

Collaborative  target  detection  and  classification  using  mul¬ 
timodal  sensor  fusion  would  increase  the  overall  performance 
because  the  heterogeneous  sensors  can  complement  each  other. 
Sensor  fusion  can  be  implemented  at  different  levels:  data- 
level  fusion,  feature-level  fusion,  and  decision-level  fusion. 
Kalman  filter  is  widely  used  for  data-level  fusion;  Dempster- 
Shafer  evidence  theory  and  Bayesian  network  are  widely 
used  for  decision-level  fusion  [8]  [9].  Data-level  fusion  has 
the  least  information  loss,  but  it  may  be  computationally 
expensive  and  vulnerable  to  sensor  degradation.  Some  of 
these  concerns  can  be  alleviated  by  decision-level  fusion,  in 
which  detection/classification  is  performed  at  the  data-level 
and  then  the  decisions  are  combined  from  individual  sensors. 
In  principle,  decision-level  fusion  is  suboptimal  since  if  a 
target  is  not  detected  by  all  sensors,  it  will  not  experience 
the  full  benefits  of  fusion  [10]. 

This  paper  introduces  a  feature-level  fusion  method  to 
address  these  issues.  Symbolic  Dynamic  Filtering  (SDF)  is  a 
data-driven  feature  extraction  tool  built  upon  the  concepts  of 
Symbolic  Dynamics  and  Probabilistic  Finite  State  Automata 
(PFSA)  [11]  [12].  In  SDF,  the  sensor  data  are  first  partitioned 
into  symbol  sequences,  and  then  PFSA  are  constructed  as 
the  representation  of  the  underlying  dynamics  in  the  data. 
A  feature-level  fusion  approach  built  under  the  framework 
of  SDF  has  been  proposed  in  [13]  for  fault  diagnosis  in 
aircraft  engine.  The  time  series  data  from  different  sensors 
are  partitioned  into  symbol  sequences  from  which  the  cross- 
PFSA,  called  cross  D-Markov  machine  [13]  (denoted  as  ‘xD- 
Markov  machine’  in  the  sequel),  is  constructed.  However,  the 
performance  of  this  method  may  degrade  significantly  if  the 
SNR  decreases.  For  analysis  of  noisy  sensor  data,  this  paper 
extends  the  concept  of  xD-Markov  machines  by  introducing 
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wavelet  surface  partitioning  [141  as  an  alternative  to  time 
series  partitioning  in  the  original  concept  in  [13].  In  the 
proposed  method,  images  of  wavelet-transformed  time  series 
are  partitioned  for  conversion  into  symbol  sequences.  Subse¬ 
quently,  xD-Markov  machines  are  constructed  from  symbol 
sequences  of  heterogeneous  sensors  to  compress  the  pertinent 
information  into  low-dimensional  statistical  patterns.  The  pro¬ 
posed  feature  extraction  algorithm  mitigates  the  detrimental 
effects  of  spurious  noise  by  using  wavelet  analysis,  captures 
the  essential  signatures  from  the  time-frequency  domain  of 
the  signals,  and  generates  low-dimensional  feature  vectors  for 
pattern  classification. 

The  proposed  method  is  validated  on  the  data  collected  from 
seismic  sensors  and  PIR  sensors  for  the  purpose  of  personnel 
detection  in  border  area.  Performance  of  information  fusion 
from  seismic  and  PIR  sensors  is  compared  with  the  results 
obtained  from  single-modal  sensors. 

II.  Problem  Description  and  Formulation 

The  problem  at  hand  is  to  detect  and  classify  different 
targets  (e.g.,  humans  and  animals),  where  seismic  and  PIR 
sensors  are  used  to  capture  the  respective  characteristic  signa¬ 
tures.  For  example,  in  the  movement  of  a  human  or  an  animal 
across  the  ground,  oscillatory  motions  of  the  body  appendages 
provide  the  respective  characteristic  signatures. 

The  seismic  and  PIR  sensor  data,  used  in  this  analysis, 
were  collected  on  multiple  days  from  test  fields  on  a  wash 
(i.e.,  the  dry  bed  of  an  intermittent  creek)  and  at  a  choke 
point  (i.e.,  a  place  where  the  targets  are  forced  to  go  due 
to  terrain  difficulties).  During  multiple  field  tests,  sensor  data 
were  collected  for  several  scenarios  that  consisted  of  targets 
walking  along  an  approximately  150  meters  long  trail,  and 
returning  along  the  same  trail  to  the  starting  point.  Figure  1 
illustrates  a  typical  data  collection  scenario. 

The  targets  consisted  of  people  (e.g.,  male  and  female 
humans)  and  animals  (e.g.,  donkeys,  mules,  and  horses). 
The  humans  walked  alone  and  in  groups  with  and  without 
backpacks;  the  animals  were  led  by  their  human  handlers 
and  they  made  runs  with  and  without  payloads.  There  were 
three  sensor  sites,  each  equipped  with  acoustic  and  seismic 
sensors.  The  seismic  sensors  were  buried  approximately  15  cm 
deep  underneath  the  soil  surface,  and  the  PIR  sensors  were 
collocated  with  the  respective  seismic  sensors.  All  targets 
passed  by  the  sensor  sites  at  a  distance  of  approximately 
5  m.  Signals  from  both  sensors  were  acquired  at  a  sampling 
frequency  of  10  kHz. 


Figure  2.  Tree  structure  formulation  of  the  detection  &  classification  problem 

The  tree  structure  in  Fig.  2  shows  how  the  detection  and 
classification  problem  is  formulated.  In  the  detection  stage, 
the  pattern  classifier  detects  the  presence  of  a  moving  target 
against  the  null  hypothesis  of  no  target  present;  in  the  classi¬ 
fication  stage,  the  pattern  classifier  discriminates  among  dif¬ 
ferent  targets.  While  the  detection  system  should  be  robust  to 
satisfy  the  specifications  of  false  alarm  rates,  the  classification 
system  must  be  sufficiently  sensitive  to  discriminate  between 
different  classes  of  targets  with  high  fidelity.  In  this  context, 
feature  extraction  plays  an  important  role  in  target  detection 
and  classification,  because  the  performance  of  the  classifier 
largely  depends  on  the  quality  of  the  extracted  features. 

III.  Semantic  Framework  of  Sensor  Fusion 

A  (three-layered)  hierarchical  semantic  framework  is  pre¬ 
sented  in  this  paper  for  the  purpose  of  multi- sensor  data  in¬ 
terpretation  and  fusion.  In  this  framework,  patterns  discovered 
from  individual  sensors  are  called  atomic  patterns  (AP),  while 
patterns  discovered  from  the  relational  dependency  between 
two  sensors  are  called  relational  patterns  (RP)  [13]. 

Let  L  =s  {£]_,  £2,  •  •  • ,  £n}  be  the  universal  set  of  atomic 
patterns.  The  atomic  pattern  library  L  is  set  of  modal  footprints 
identified  from  individual  sensing  modalities  for  targets  of 
different  classes.  Given  the  atomic  pattern  library,  a  popular 
framework  for  addressing  information  fusion  is  what  is  called 
the  set-theoretic  approach.  In  this  framework,  higher  level 
patterns  or  contexts  are  modeled  as  subsets  of  L.  Thus  a 
composite  pattern,  resulting  from  fusion  of  atomic  patterns, 
is  a  collection  of  atomic  patterns  from  L  and  the  resulting 
library  of  composite  patterns  is  a  subset  of  the  power  set  of  the 
atomic  pattern  library,  i.e.,  L*  C  2L.  However,  a  disadvantage 
of  this  approach  is  that  it  considers  only  modal  footprints  for 
constructing  composite  patterns  as  a  bag  of  atomic  patterns ; 
relational  dependencies  between  patterns  are  disregarded. 

Since  the  relational  dependencies  cannot  be  ignored  in 
many  practical  problems,  a  hierarchical  semantic  framework 
for  multi-sensor  data  interpretation  and  fusion  is  proposed  in 
this  paper,  which  involves  a  common  approach  to  information 
fusion  at  different  layers  of  the  hierarchy  and  to  include 
relational  dependencies  for  composite  pattern  representation. 
Thus,  the  middle  layer  deals  with  the  relational  dependencies 
among  atomic  patterns,  where  relationships  are  modeled  as 
the  cross-dependencies  among  sensor  data  streams  from  dif¬ 
ferent  sensors.  These  cross-dependencies  are  discovered  via 
relational  PFSA  that  essentially  capture  the  dynamics  of  state 
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Figure  3.  Comparison  of  the  set- theoretic  fusion  (left)  and  the  proposed  semantic  fusion  of  multimodal  sensors  (right) 


transition  in  one  symbol  sequence  (e.g.,  obtained  from  one 
sensor)  corresponding  to  a  symbol  appearance  in  the  second 
symbol  sequence  (e.g.,  obtained  from  another  sensor).  Loose 
time- synchronization  between  sensor  observations  should  be 
adequate  for  this  purpose.  Symbol-level  cross-dependencies 
among  modalities  are  exploited  to  mitigate  information  loss. 

Finally,  the  top  layer  consists  of  higher  level  composite 
patterns  that  is  represented  as  digraphs  where  the  atomic 
patterns  (AP)  are  modeled  as  nodes  and  dependencies  between 
nodes  are  modeled  as  relational  patterns  (RP).  An  illustrative 
example  in  Fig.  3  compares  the  set- theoretic  fusion  (left) 
with  the  proposed  co-dependence  aware  fusion  (right).  The 
definition  of  the  composite  pattern  is  as  follows. 

Definition  3.1  ( Composite  pattern  representation ):  Let 
L  =  {C\,  £2,  •  •  • ,  £n}  be  the  atomic  pattern  library  and  let 
L*  C  2l  be  the  set  of  allowable  primitives  for  a  class.  Then, 
a  composite  pattern  library  Gr  —  ...  ,  QrM }  where 

the  ith  composite  pattern  Q\  is  a  digraph  Q\  m  (£y.  ,£y.); 
Cvi  Q  L  with  the  index  set  Vt  C  {1,2,..., TV}  and 
Si  =  { IZjk , ...}  is  a  set  of  relational  PFSAs. 

The  relational  PFSAs  are  discovered  by  cross  D-Markov 
machine  [13]  construction  to  determine  the  respective  cross¬ 
dependence;  the  algorithm  is  described  in  Section  III-C.  Ab¬ 
sence  of  a  directed  edge  in  composite  pattern  digraph  would 
be  represented  by  a  single  state  machine  for  relational  PFSA, 
which  implies  the  lack  of  prediction  capability  of  a  target  state 
by  the  parent  state. 

A.  Sensor  Signal  Conditioning  and  Transformation 

This  section  presents  the  procedure  for  generation  of 
wavelet  coefficient,  i.e.,  an  image  in  the  scale-shift  domain, 


denoted  as  ‘wavelet  image’  in  the  sequel,  from  observed  sensor 
time  series  for  construction  of  symbolic  representations  of  the 
underlying  dynamics.  In  this  SDF-based  procedure,  a  crucial 
step  is  partitioning  of  the  phase  space  for  symbol  sequence 
generation.  Various  partitioning  techniques  have  been  reported 
in  literature,  and  a  brief  review  is  given  in  [14]. 

In  wavelet-based  partitioning,  time  series  are  first  trans¬ 
formed  to  wavelet  domain,  where  wavelet  coefficients  are 
generated  at  different  time  shifts  and  scales.  The  choice  of 
the  wavelet  basis  function  and  wavelet  scales  depends  on  the 
time-frequency  characteristics  of  individual  signals. 

For  every  wavelet,  there  exists  a  certain  frequency  called 
the  center  frequency  Fc  that  has  the  maximum  modulus  in  the 
Fourier  transform  of  the  wavelet.  The  pseudo-frequency  fp  of 
the  wavelet  at  a  particular  scale  a  is  given  by  the  following 
formula: 

fp  =  ^r+i  (!) 

y  a  At 

where  At  is  the  sampling  interval.  Then  the  scales  can  be 
calculated  as  follows: 

=  7Ta*  (2) 


where  i  =  1,2,...,  and  /*  are  the  frequencies  that  can  be 
obtained  by  choosing  the  locally  dominant  frequencies  in  the 
Fourier  transform. 

Figure  4  shows  an  illustrative  example  of  transformation  of 
the  time  series  in  Fig.  4(a)  to  the  two-dimensional  wavelet  im¬ 
age  in  Fig.  4(b);  the  amplitudes  of  the  wavelet  coefficients  over 
the  scale-shift  domain  are  plotted  as  a  surface.  Subsequently, 
symbolization  of  this  wavelet  surface  leads  to  the  formation 
of  a  symbolic  image  as  shown  in  Fig.  4(c). 
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Figure  4.  Symbol  image  generation  via  wavelet  transform  of  the  sensor  time  series  data  and  partition  of  the  wavelet  surface  in  ordinate  direction 


B.  Symbolization  of  Wavelet  Surface  Profiles 

This  section  presents  partitioning  of  the  wavelet  surface 
profile,  as  shown  in  Fig.  4(b),  which  is  generated  by  the 
coefficients  over  the  two-dimensional  scale-shift  domain,  for 
construction  of  the  symbolic  image  in  Fig.  4(c).  The  x  —  y 
coordinates  of  the  wavelet  surface  profiles  denote  the  shifts 
and  the  scales  respectively,  and  the  z-coordinate  denotes  the 
pixel  values  of  wavelet  coefficients  (i.e.,  the  surface  height). 

Definition  3.2:  ( Wavelet  Surface  Profile)  Let  Ti  =  {(fj)  : 
i,  j  G  N,  1  <  i  <  m,  1  <  j  <  n)  be  the  set  of  coordinates 
consisting  of  (m  x  n)  pixels  denoting  the  scale-shift  data 
points.  Let  1Z  denote  the  interval  that  spans  the  range  of 
wavelet  coefficient  amplitudes.  Then,  a  wavelet  surface  profile 
is  defined  as 

S-.H^IZ  (3) 

Definition  3.3:  (, Symbolization )  Given  the  symbol  alphabet 
E,  let  the  partitioning  of  the  interval  1Z  be  defined  by  a  map 
P  :  1Z  — ►  E.  Then,  the  symbolization  of  a  wavelet  surface 
profile  is  defined  by  a  map  <Ss  =  P  o  S  such  that 

E  (4) 

that  labels  each  pixel  of  the  image  to  a  symbol  in  E. 

The  wavelet  surface  profiles  are  partitioned  such  that  the 
ordinates  between  the  maximum  and  minimum  of  the  coeffi¬ 
cients  along  the  z- axis  are  divided  into  regions  by  different 
planes  parallel  to  the  x  —  y  plane.  For  example,  if  the  alphabet 
is  chosen  as  E  =  {a,  b,  c,  d},  i.e.,  |E|  =  4,  then  three 
partitioning  planes  divide  the  ordinate  (i.e.,  z-axis)  of  the 
surface  profile  into  four  mutually  exclusive  and  exhaustive 
regions,  as  shown  in  Figure  4  (b).  These  disjoint  regions  form 
a  partition,  where  each  region  is  labeled  with  one  symbol 
from  the  alphabet  E.  If  the  intensity  of  a  pixel  is  located  in  a 
particular  region,  then  it  is  coded  with  the  symbol  associated 
with  that  region.  As  such,  a  symbol  from  the  alphabet  E  is 
assigned  to  each  pixel  corresponding  to  the  region  where  its 
intensity  falls.  Thus,  the  two-dimensional  array  of  symbols, 
called  symbol  image ,  is  generated  from  the  wavelet  surface 
profile,  as  shown  in  Figure  4  (c). 

The  surface  profiles  are  partitioned  by  using  either  the  max¬ 
imum  entropy  partitioning  (MEP)  or  the  uniform  partitioning 
(UP)  methods  [14].  If  the  partitioning  planes  are  separated  by 


equal-sized  intervals,  then  the  partition  is  called  the  uniform 
partitioning  (UP).  Intuitively,  it  is  more  reasonable  if  the 
information-rich  regions  of  a  data  set  are  partitioned  finer 
and  those  with  sparse  information  are  partitioned  coarser.  To 
achieve  this  objective,  the  MEP  method  has  been  adopted  such 
that  the  entropy  of  the  generated  symbols  is  maximized.  In 
general,  the  choice  of  alphabet  size  depends  on  specific  data 
set.  The  partitioning  of  wavelet  surface  profiles  to  generate 
symbolic  representations  enables  robust  feature  extraction, 
and  symbolization  also  significantly  reduces  the  memory 
requirements.  For  the  purpose  of  pattern  classification,  the 
reference  data  set  is  partitioned  with  alphabet  size  |E|  and  is 
subsequently  kept  constant.  In  other  words,  the  structure  of  the 
partition  is  fixed  at  the  reference  condition  and  this  partition 
serves  as  the  reference  frame  for  subsequent  data  analysis  [11]. 

C.  Construction  of  PFSA  for  Feature  Extraction 

This  section  presents  construction  of  a  probabilistic  finite 
state  automaton  (PFSA)  for  feature  extraction  based  on  the 
symbol  image  generated  from  a  wavelet  surface  profile. 

For  analysis  of  (one-dimensional)  time  series,  a  PFSA  is 
constructed  such  that  its  states  represent  different  combinations 
of  blocks  of  symbols  on  the  symbol  sequence.  The  edges 
connecting  these  states  represent  the  transition  probabilities 
between  these  blocks  [11].  Therefore,  for  analysis  of  (one  di¬ 
mensional)  time  series,  the  ‘states’  denote  all  possible  symbol 
blocks  (i.e.,  words)  within  a  window  of  certain  length.  Let  us 
now  extend  the  notion  of  ‘states’  on  a  two-dimensional  domain 
for  analysis  of  wavelet  surface  profiles. 

The  concept  of  D-Markov  machine  has  been  introduced  by 
the  authors  in  their  previous  publications  under  the  framework 
of  Symbol  Dynamic  Filtering  (SDF)  [11],  [14]  to  extract  in¬ 
formation  from  symbol  sequences/images  which  are  generated 
from  single  sensors.  In  this  paper,  a  generalization  of  the 
D-Markov  machine  is  proposed,  called  xD-Markov  machine, 
which  captures  the  symbol  level  cross-dependence.  The  D- 
Markov  machine  is  a  special  case  of  the  xD-Markov  machine 
in  the  following  sense:  when  both  symbol  sequences  are 
the  same,  the  relational  patterns  are  essentially  the  atomic 
patterns  corresponding  to  the  symbol  sequence;  i.e.,  xD- 
Markov  machine  reduces  to  a  simple  D-Markov  machine.  The 
feature  vectors  extracted  from  xD-Markov  machine  with  two 
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Figure  5.  Composite  Pattern  Digraph 


same  symbol  sequences  (i.e.,  from  D-Markov  machine)  are 
called  Atomic  Pattern  (AP),  and  those  extracted  from  xD- 
Markov  machine  with  two  different  symbol  sequences  are 
called  Relational  Pattern  (RP).  The  digraph  representation  of 
AP  and  RP  is  illustrated  in  Fig.  5. 

The  xD-Markov  machines  are  constructed  based  on  two 
symbol  sequences  {si }  and  {S2 }  obtained  from  two  different 
sensors  (possibly  heterogeneous)  to  capture  the  symbol  level 
cross-dependence.  Conversion  of  the  two-dimensional  symbol 
image  to  one-dimensional  symbol  sequence  while  retaining 
the  pertinent  information  is  a  difficult  problem.  One  simple 
way  for  the  conversion  is  to  stack  each  row  in  the  symbol 
image  one  after  another.  However,  this  method  only  works 
if  the  two  symbol  sequences  {si }  and  {S2 }  have  the  same 
number  of  scales  in  wavelet  transform;  otherwise  their  length 
of  the  generated  symbol  sequences  will  not  be  equal.  More 
importantly,  this  method  may  suffer  information  loss  in  the 
frequency  domain  by  only  looking  for  relational  dependency 
at  the  similar  frequency  bands  in  {si }  and  {s2 }.  It  is  highly 
possible  that  the  low  frequency  component  in  Hi  has  stronger 
correlation  with  the  high  frequency  component  in  7^2  • 

To  avoid  these  issues,  a  new  method  for  converting  symbol 
images  from  two  sensors  to  symbol  sequences  for  discovery 
of  relational  dependency  is  proposed.  The  key  idea  is  to 
exhaustively  find  all  possible  combination  of  rows  between 
the  two  symbol  images.  A  formal  definition  is  as  follows: 

Definition  3.4  (Conversion):  Let  Hi  and  7^2  be  the  wavelet 
coefficients  (images)  of  sensor  1  and  sensor  2,  consisting  of 
mi  xn  and  m2  x  n  pixels,  respectively.  Let  Wf  C  Hi  be  the 
window  that  covers  the  j th  scale  in  Hi.  Then  the  two  symbol 
sequences  {si }  and  {S2}  are  defined  as 

{Si}  =  . .  .SeO^1),  •  ■  ■  ,  5S(WH  . .  .Se(WT)] 

V  - v - -  ' - v - ' 

m2  m2 

{s2}  =  [SeOV])  . .  .*Ss(W2m2),  •  •  ■  ,  SeO^1)  ■  •  .<Ss(W2m2)] 

V  - V - ' 

mi 

By  implementing  the  conversion  procedure  as  defined 
above,  the  wavelet  images  Hi  of  (mi  x  n)  pixels  and  H2  of 
(m2  x  n)  are  converted  to  one-dimensional  symbol  sequences 


Symbol  Image  1 


Symbol  Image  2 


{S2} 

\ _ 

.feC'Wa1)  MWi) 

Figure  6.  An  illustration  of  converting  symbol  images  to  symbol  sequences 


of  the  same  length  (mi  x  m2  xn).  An  illustration  is  given  in 
Fig.  6,  where  mi  =  1,  m2  =  2,  n  =  5  and  |£i|  -  (£2|  =  4. 
A  formal  definition  of  the  xD-Markov  machine  is  as  follows: 

Definition  3.5  (xD-Markov):  Let  Mi  and  M2  be  the  PF- 
SAs  corresponding  to  symbol  sequences  {si }  and  {S2 }  re¬ 
spectively.  Then  a  xD-Markov  machine  is  defined  as  a  4-tuple 
Mi-,2  —  (Qi,  £2,  £12,  fti2)  such  that: 

•  £2  =  {do, ct|s2|-i}  is  the  alphabet  set  of  symbol 
sequence  {S2} 

•  Qi  =  {gi,  t/2,  •  •  • ,  (Z|E|£>i  }  is  the  state  set  corresponding 
to  symbol  sequence  (si },  where  Di  is  the  depth  for  {si } 

•  £12  :  Q 1  x  £2  — *  Qi  is  the  state  transition  mapping  that 
maps  the  transition  in  symbol  sequence  (si }  from  one 
state  to  another  upon  arrival  of  a  symbol  in  {s2 } 

•  II12  is  the  symbol  generation  matrix  of  size  \  Qi  \  x  |£2|; 
the  ij  element  of  ft  12  denotes  the  probability  of  finding 
jth  symbol  in  (s2 }  while  making  a  transition  from  ith 
state  in  the  symbol  sequence  {si } 

In  practice,  II12  is  reshaped  into  a  vector  pi2  of  length 
\Qi\  x  | £2 1  and  is  treated  as  the  extracted  feature  vector  that 
is  a  representation  of  the  relational  dependence  between  (si } 
and  (s2 }.  This  feature  vector  is  called  a  Relational  Pattern 
(RP).  The  xD-Markov  machine  A^2^i  and  the  corresponding 
feature  vector  p2i  are  defined  similarly.  Fig.  5  schematically 
describes  the  basic  concept  of  the  xD-Markov  machine.  Note, 
a  RP  between  two  symbol  sequences  is  not  symmetric;  there¬ 
fore,  RPs  need  to  be  identified  for  both  directions.  If  {si } 
and  (s2 }  are  the  same,  then  the  xD-Markov  machines  Mi^i 
and  M2^2  reduce  to  the  simple  D-Markov  machine,  and  the 
feature  vector  obtained  from  ftn  or  ft22  is  called  an  Atomic 
Pattern  (AP). 

The  set-theoretic  approach  falls  at  one  end  of  the  spectrum 
of  information  fusion;  here  all  relationships  are  excluded 
and  any  fusion  is  solely  done  in  the  decision-theoretic  sense 
where  the  presence  (or  absence)  of  one  or  more  footprints 
can  be  used  to  estimate  the  probability  of  the  fault  class 
under  consideration.  The  other  end  of  the  spectrum  is  to 
fuse  data  at  the  lowest  level  and  construct  machines  (PFSAs) 
working  in  the  product  space  of  all  sensors.  This  approach 
would  be  able  extract  modal  dependencies  before  they  are  lost 
when  constructing  separate  machines  for  individual  sensor  or 
modalities.  But  working  in  the  product  space  has  the  danger  of 
state  space  explosion  especially  when  the  sensors  and  sensing 
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modalities  can  be  numerous  [13].  The  proposed  approach 
is  a  trade-off  between  the  two  ends  of  the  spectrum  and 
attempts  to  include  relational  dependencies  between  sensing 
modalities,  while  keeping  it  tractable  for  practical  application. 
A  hierarchical  approach  ensures  that  composite  patterns  are 
identified  only  when  its  constituting  units  at  the  lower  level 
have  been  observed.  In  the  current  framework  we  have  con¬ 
sidered  relations  taken  only  two  at  a  time,  but  we  propose  to 
explore  relations  between  higher  order  cliques  as  future  work. 

D.  Feature  Selection  and  Pattern  Classification 

Once  the  feature  vectors  are  extracted  from  the  observed 
sensor  time  series,  the  next  step  is  to  classify  these  patterns 
into  different  categories  based  on  the  particular  application. 
However,  the  feature  vector  ptj  obtained  from  has  the 
dimension  of  \Qi\  x  \T,j\  which  is  still  high  if  \Qi\  and 
\T,j\  are  large.  More  importantly,  many  features  in  ptJ  may 
be  zero  since  some  transitions  never  occur.  Therefore,  it  is 
necessary  to  perform  feature  selection  in  order  to  find  the 
most  representative  and  discriminative  features  and  speed  up 
the  pattern  classification  process.  Many  standard  methods  can 
be  found  in  feature  selection  literature  [15];  a  simple  method 
that  selects  the  features  with  large  inter-class  separation  and 
small  intra-class  variance  is  adopted  in  this  paper.  Advanced 
methods  such  as  forward/backward  selection  [15]  and  mRMR 
feature  selection  [16]  may  further  improve  the  classification 
result  by  selecting  the  most  representative  and  discriminative 
features;  however,  it  is  a  topic  of  future  research  and  not  the 
focus  of  this  paper. 

Pattern  classification  for  personnel  detection  is  posed  as  a 
two-stage  problem,  i.e.,  the  training  stage  and  the  testing  stage. 
The  sensor  time  series  data  sets  are  divided  into  three  groups: 
i)  partition  data,  ii)  training  data,  and  iii)  testing  data.  The 
partition  data  set  is  used  to  generate  partition  planes  that  are 
used  in  the  training  and  the  testing  stages.  The  training  data 
set  is  used  to  generate  the  training  patterns  of  different  classes 
for  the  pattern  classifier.  Multiple  sets  of  training  data  are 
obtained  from  independent  experiments  for  each  class  in  order 
to  provide  a  good  statistical  spread  of  patterns.  Subsequently, 
the  class  labels  of  the  testing  patterns  are  generated  from 
testing  data  in  the  testing  stage.  The  partition  data  sets  may 
be  part  of  the  training  data  sets,  whereas  the  training  data  sets 
and  the  testing  data  sets  must  be  mutually  exclusive. 

The  partition  data  is  wavelet-transformed  with  appropriate 
scales  to  convert  the  one-dimensional  numeric  time  series  data 
into  the  wavelet  image.  The  corresponding  wavelet  surface 
is  analyzed  using  the  maximum  entropy  principle  [14]  to 
generate  the  partition  planes  that  remain  invariant  for  both  the 
training  and  the  testing  stage.  The  scales  used  in  the  wavelet 
transform  of  the  partitioning  data  also  remain  invariant  during 
the  wavelet  transform  of  the  training  and  the  testing  data. 
In  the  training  stage,  the  wavelet  surfaces  are  generated  by 
transformation  of  the  training  data  sets  corresponding  to  differ¬ 
ent  classes.  These  surfaces  are  symbolized  using  the  partition 
planes  to  generate  the  symbol  images.  Subsequently,  PFSAs 
(either  D-Markov  or  xD-Markov  machines)  are  constructed 


based  on  the  corresponding  symbol  images,  and  the  training 
patterns  are  extracted  from  these  PFSAs.  Similar  to  the  training 
stage,  the  PFSA  and  the  associated  pattern  is  generated  for 
different  data  sets  in  the  testing  stage. 

Finally  a  classifier  is  trained  using  features  of  different 
classes  extracted  from  training  data  and  can  be  used  to  classify 
the  features  from  test  data  set.  There  are  plenty  of  choices 
available  for  design  of  both  parametric  and  non-parametric 
classifiers  in  literature  [17].  Among  the  parametric  type  of 
classifiers,  one  of  the  most  common  techniques  is  to  consider 
up  to  two  orders  of  statistics  in  the  feature  space.  In  other 
words,  the  mean  feature  is  calculated  for  every  class  along 
with  the  variance  of  the  feature  space  distribution  in  the 
training  set.  Then,  a  test  feature  vector  is  classified  by  using 
the  Mahalanobis  distance  or  the  Bhattacharya  distance  of 
the  test  vector  from  the  mean  feature  vector  of  each  class. 
However,  these  methods  lack  in  efficiency  if  the  feature  space 
distribution  cannot  be  described  by  second  order  statistics 
(i.e.,  non-Gaussian  in  nature).  In  the  present  context,  Gaussian 
feature  space  distribution  cannot  be  ensured  due  to  the  non¬ 
linear  nature  of  the  partitioning  feature  extraction  technique. 
Therefore,  a  non-parametric  classifier,  such  as  the  k- Nearest 
Neighbors  (k- NN)  classifier  may  a  better  candidate  for  this 
study  [17];  however,  in  general,  any  other  suitable  classifier, 
such  as  the  Support  Vector  Machines  (SVM)  or  the  Gaussian 
Mixture  Models  (GMM)  may  also  be  used. 

IV.  Results  of  Field  Data  Analysis 

Field  data  were  collected  in  the  scenario  illustrated  in  Fig.  1. 
Multiple  data  runs  were  made  to  collect  data  sets  of  all  three 
classes,  i.e.,  no  target,  human,  and  animal.  The  data  were 
collected  over  three  days  at  different  sites.  A  brief  summary 
is  given  in  Table  I  showing  the  number  of  runs  of  each  class. 

Each  data  set,  sampled  at  a  sampling  frequency  of  Fs  =  10 
kHz,  has  1  x  105  data  points  that  correspond  to  10  seconds  of 
the  experimentation  time.  In  order  to  test  the  capability  of  the 
proposed  algorithm  in  target  detection,  another  group  of  data 
were  collected  with  no  target  present.  The  problem  of  target 
detection  is  then  formulated  as  a  binary  pattern  classification, 
where  the  no  target  present  data  are  considered  as  one  class, 
and  the  others  with  target  present  (i.e.,  human  or  animal)  are 
considered  to  belong  to  the  other  class.  The  data  sets,  collected 
by  the  channel  of  seismic  sensors  that  are  orthogonal  to  the 
ground  surface  and  the  PIR  sensors  that  are  collocated  with  the 
seismic  sensors,  are  used  for  target  detection  and  classification. 
For  computational  efficiency,  the  original  seismic  and  PIR  data 
were  downsampled  by  a  factor  of  10. 

Table  I 

The  number  of  Feature  Vectors  for  Each  Target  Class  in  the 
Data  Sets  used  for  Three-way  Cross  Validation  (Set  3) 


Day  1 

Day  2 

Day  3 

Total 

No  target 

50 

28 

32 

110 

Human 

30 

22 

14 

66 

Animal 

20 

6 

18 

44 
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Figure  7.  Flow  chart  of  the  problem  of  target  detection  and  classification 


Figure  7  depicts  the  flow  chart  of  the  proposed  detection 
and  classification  algorithm  that  is  constructed  based  on  the 
theories  of  symbolic  dynamic  filtering  (SDF)  and  fc-nearest 
neighbors  (k- NN)  classifier  [17].  The  proposed  algorithm 
consists  of  four  steps,  namely,  signal  preprocessing,  feature 
extraction,  detection,  and  classification,  as  shown  in  Fig.  7. 

In  the  signal  preprocessing  step,  the  DC  component  of  the 
seismic  signal  is  eliminated,  and  the  signal  is  normalized  to 
unit  variance.  The  amplitude  of  seismic  signal  of  a  horse  with 
heavy  payload  passing  by  far  away  may  be  similar  to  that 
of  a  pedestrian  passing  by  at  close  distance  due  to  the  fact 
that  the  SNR  decreases  rapidly  with  the  distance  between 
the  sensor  and  the  target.  The  normalization  of  all  signals 
to  unit  variance  makes  the  pattern  classifier  independent  of 
the  signal  amplitude  and  any  discrimination  should  be  solely 
texture-dependent.  For  PIR  signals,  only  the  DC  component  is 
removed  and  the  normalization  is  not  performed  because  the 
range  of  PIR  signals  do  not  change. 

In  the  feature  extraction  step,  SDF  captures  the  signatures  of 
the  preprocessed  sensor  time-series  for  representation  as  low¬ 
dimensional  feature  vectors.  Based  on  the  spectral  analysis 
of  the  ensemble  of  seismic  data  at  hand,  a  series  of  pseudo¬ 
frequency  from  the  1-20  Hz  bands  have  been  chosen  to 
generate  the  scales  for  wavelet  transform,  because  these  bands 
contain  a  very  large  part  of  the  footstep  energy  [6].  Similarly, 
a  series  of  pseudo-frequency  from  the  0.2-2  Hz  bands  have 
been  chosen  for  PIR  signals  to  generate  the  scales.  Upon 
generation  of  the  scales,  continuous  wavelet  transforms  (CWT) 
are  performed  with  appropriate  wavelet  basis  function  on  the 
seismic  and  PIR  signals.  db7  is  used  for  seismic  signals  since  it 
matches  the  impulsive  shape  of  seismic  signals  very  well,  dbl 
is  used  for  the  PIR  case  since  PIR  signals  are  close  to  square 


waves.  A  maximum-entropy  wavelet  surface  partitioning  is 
then  performed.  Selection  of  the  alphabet  size  |£|  depends 
on  the  characteristics  of  the  signal:  a  small  alphabet  size  is 
robust  against  noise  and  environmental  variation,  while  a  large 
alphabet  size  has  more  discriminant  power  for  identifying 
different  objects.  The  same  alphabet  is  used  for  both  target 
detection  and  classification  and  the  issues  of  alphabet  size 
optimization  and  data  set  partitioning  are  not  addressed  in  this 
paper.  The  execution  of  the  code  takes  less  than  1  second 
for  SDF  to  process  a  data  set  of  1  x  104  points  with  the 
following  choice  of  parameters:  alphabet  size  |£i|  =  |£2|  =8, 
number  of  scales  \ai  \  =  \a2\  =  4,  for  seismic  and  PIR  sensor 
signals,  respectively.  M\^2  and  M2^2  are  used  to  form  the 
composite  pattern. 

The  next  step  is  to  perform  pattern  classification  on  the 
feature  vectors.  Two  classifiers  are  needed  as  in  the  flow  chart 
of  Fig.  7:  one  for  target  detection  to  decide  whether  a  target 
is  present  or  not,  and  the  other  to  identify  the  target.  All 
classifiers  are  implemented  by  k- NN  classifier.  The  available 
feature  vectors  are  divided  into  two  sets:  a  training  set  and  a 
testing  set.  In  the  numerical  results  presented  in  the  following 
sections,  a  three-way  cross-validation  [17]  is  used.  The  data 
is  divided  into  three  sets  by  date  (Day  1,  Day  2,  Day  3)  and 
three  different  sets  of  experiments  are  performed: 

1)  Training:  Day  1  +  Day  2;  Testing:  Day  3 

2)  Training:  Day  1  +  Day  3;  Testing:  Day  2 

3)  Training:  Day  2  +  Day  3;  Testing:  Day  1 

Training  and  testing  on  feature  vectors  from  different  days 
is  very  meaningful  in  practice.  In  each  run  of  cross-validation, 
no  prior  information  is  assumed  for  the  testing  site  or  testing 

Table  II 

Confusion  Matrices  of  the  Three-Way  Cross-Validation 

Results  using  Seismic,  PIR  and  Fusion  of  the  Two  Sensors 


Seismic  Sensor 

No  target 

Human 

Animal 

No  target 

76 

5 

29 

Human 

16 

29 

21 

Animal 

6 

13 

25 

PIR  Sensor 

No  target 

Human 

Animal 

No  target 

110 

0 

0 

Human 

3 

51 

12 

Animal 

0 

9 

35 

Sensor  Fusion 

No  target 

Human 

Animal 

No  target 

110 

0 

0 

Human 

1 

52 

13 

Animal 

0 

1 

43 

Table  III 

Comparison  of  the  Detection  and  Classification  Accuracy  by 
using  Seismic,  PIR  and  Fusion  of  the  Two  Sensors 


Seismic 

PIR 

Fusion 

Detection 

74.5% 

98.6% 

99.5% 

Classification 

61.4% 

80.4% 

87.2% 
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data.  The  classifiers’  capability  to  generalize  to  an  independent 
data  set  is  thoroughly  tested  in  the  three-way  cross-validation. 

Following  Fig.  7,  the  following  cases  are  tested: 

1)  Detection  of  target  presence  against  target  absence; 

2)  Classification  of  target  type,  i.e.,  Human  vs.  Animal. 

Table  II  shows  the  confusion  matrices  of  the  three-way 

cross-validation  using  seismic  sensor,  PIR  sensor  and  fusion  of 
the  two  sensors.  The  shaded  area  in  Table  II  represents  the  con¬ 
fusion  matrices  of  target  classification.  Table  III  summaries  the 
detection  and  classification  accuracy  in  Table  II.  It  is  observed 
that  the  seismic  sensor  does  not  perform  well  for  training  and 
testing  in  different  test  sites.  This  is  because  seismic  sensor 
is  not  site  independent;  variation  in  ground  impedance  and 
texture  may  affect  the  performance  in  target  detection  and 
classification.  PIR  sensors  are  almost  site  independent  and 
achieve  much  higher  accuracy  than  seismic  sensors  in  both 
detection  and  classification.  By  using  the  composite  pattern 
generated  by  fusing  the  signals  from  seismic  and  PIR  sensors, 
the  detection  and  classification  results  are  further  improved. 

V.  Summary,  Conclusions  and  Future  Work 

This  paper  presents  a  feature-level  fusion  method  for  per¬ 
sonnel  detection  using  multimodal  sensors.  These  features  are 
extracted  as  statistical  patterns  by  constructing  xD-Markov 
machines  from  time  series  data  collected  from  multimodal 
sensors.  An  appropriate  selection  of  the  basis  function  and 
the  scale  range  allows  the  wavelet-transformed  signal  to  be 
de-noised  relative  to  the  original  noise-contaminated  signal 
before  partitioning  of  the  resulting  wavelet  image  for  symbol 
generation.  The  xD-Markov  machine  identifies  the  cross¬ 
dependencies  among  different  sensors  and  mitigates  loss  of 
significant  information  as  compared  to  set-theoretic  informa¬ 
tion  fusion  method.  A  distinct  advantage  of  the  proposed 
method  is  that  the  low-dimensional  feature  vectors,  extracted 
from  the  xD-Markov  machine,  can  be  computed  in  situ  and 
communicated  in  real  time  over  a  limited-bandwidth  wireless 
sensor  network  with  limited-memory  nodes. 

The  proposed  method  has  been  validated  on  a  set  of  field 
data  collected  from  different  locations  on  different  days.  A 
comparative  evaluation  is  performed  on  the  feature  vectors 
extracted  from  single  seismic  and  single  PIR  sensors  as  well 
as  the  composite  pattern  generated  by  fusion  of  the  seismic 
and  PIR  sensors  using  xD-Markov  machine.  Results  show  that, 
while  PIR  sensors  alone  perform  better  than  seismic  sensors 
alone,  the  co-dependence-aware  fusion  further  improves  the 
detection  and  classification  performance. 

While  there  are  many  research  issues  that  need  to  be 
resolved  before  exploring  commercial  applications  of  the  pro¬ 
posed  method,  the  following  topics  are  under  active  research: 

•  Exploration  of  alternative  ways  for  construction  of  rela¬ 
tional  PFSAs  from  wavelet  images  with  mutiple  scales; 

•  Improvement  of  the  feature  selection  procedure  by  adopt¬ 
ing  more  advanced  methods  [15],  [16]; 

•  Development  of  algorithms  to  extract  relational  depen¬ 
dencies  among  three  or  more  symbol  sequences; 


•  Comparative  evaluation  of  the  proposed  sensor  fusion 
method  with  Dempster-Shafer  and  Bayesian  network  ap¬ 
proaches  [17]  at  the  decision  fusion  level. 
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